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Abstract 

The historical research line on the algebraic properties of structured CF languages initiated by McNaughton's 
Parenthesis Languages has recently attracted much renewed interest with the Balanced Languages, the Visi- 
bly Pushdown Automata languages (VPDA), the Synchronized Languages, and the Height-deterministic 
ones. Such families preserve to a varying degree the basic algebraic properties of Regular languages: 
boolean closure, closure under reversal, under concatenation, and Kleene star. We prove that the VPDA 
family is strictly contained within the Floyd Grammars (FG) family historically known as operator prece- 
dence. Languages over the same precedence matrix are known to be closed under boolean operations, and 
are recognized by a machine whose pop or push operations on the stack are purely determined by terminal 
letters. We characterize VPDAs as the subclass of FG having a peculiarly structured set of precedence 
relations, and balanced grammars as a further restricted case. The non-counting invariance property of FG 
has a direct implication for VPDA too. 



1. Introduction 

From the very beginning of formal language science, research has struggled with the wish and need to 
extend as far as possible the nice and powerful properties of regular languages (specifically closure proper- 



ties). A major initial step has been made by McNaughton with parenthesis grammars [ 17], characterized by 
enclosing any righthand side within a pair of parentheses; the alphabet is the disjoint union of internal letters 
and the pair. By considering instead of strings the stencil or skeletal trees encoded by parenthesized strings, 
some typical properties of regular languages that do not hold for CF languages are still valid: uniqueness 
of the minimal grammar, and boolean closure within the class of languages having the same rule stencils. 



Further mathematical developments of such ideas have been pursued in the setting of tree automata 112011 . 
Several decades later, novel motivation arose for the investigation of parenthesis-like languages from the 
interest for mark-up languages such as XML. The balanced grammars and languages [2] generalize the 
parenthesis grammars in two ways: several pairs of parentheses are allowed, and the right-hand side of the 
grammar rules permit a regular expression over nonterminal and internal symbols to occur between match- 
ing parentheses. The property of uniqueness of the minimal grammar is preserved, and the family has the 
property of closure w.r.t. concatenation and Kleene star, that was missing in parenthesis languages. Clearly 
balanced as well as parenthesis languages are closed under reversal. 

Model checking and static program analysis provide an entirely different long-standing motivation for such 
families of languages — those that extend the typical regular properties to infinite-state pushdown systems. 
To the best of our knowledge the seminal paper of this "new era" is |1] which defines visibly pushdown 
automata and languages (VPDA), a subclass of realtime pushdown automata and deterministic context-free 
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languages. The input alphabet is partitioned into three sets named calls, returns and internals, and the deci- 
sion of the type of move to perform (push, pop, or a stack neutral move) is determined by the membership 
of the current input letter; in other words the type of a move is solely input-driven. VPDA languages ex- 
tend balanced grammars in two ways that are important for modelling symbolic program execution: they 
allow parentheses to remain unbalanced to represent an execution state where some procedures have not 
returned, and a call symbol can be matched by two or more return symbols to represent procedures with 
multiple exits. For each partitioned alphabet the corresponding language family is closed under the regular 
operations, including complement. VPDA's can be determinized and reversal produces a VPDA with calls 
and returns interchanged. We observe that the intended applications to static program analysis need closure 
under reversal in order to compute the pre- and post-reachability sets. 

Impulsed by this new approach, a variety of extensions and specializations of the original class have been 
proposed and investigated. Among them, we mention the following. The synchronized pushdown automata 
Jj], instead of the fixed 3-partition of VPDA's, use a finite transducer that determines the type of move the 
PDA must perform. 

The height-deterministic automata Jl8h further extended the previous idea by considering the class of PDAs 
characterized by the same integer- valued function returning the height of the stack for each input string; 
within this approach the deterministic and the real-time cases are singled out for having richer closure prop- 
erties. Last, the synchronized grammars |0] are a more comprehensive model that uses an input-driven 
pushdown transducer to decide the type of a move. Not surprisingly, such more general models lose certain 
nice properties of VPL, in particular the closure under reversal, concatenation, and Kleene star. 
Short after McNaughton's results, we investigated similar closure properties of Floyd's operator precedence 
Grammars JI2I1 Q (FG), an elegant precursor of LR(fc) grammars, also exploited by one of us in his work 
on grammar inference I6Q. For any given precedence matrix a syntax tree stencil is defined a priori for any 
word that is generated by any FG having the same precedence matrix. The family of such Floyd grammars 
and the related languages are a boolean algebra |9]. We also extended the notion of non-counting regular 
language of McNaughton and Papert Jl9ll to the parenthesis languages |0] and to FG 
In this paper we resume the study of FG in the perspective of the cited grammatical models. We show that 
VPDA is a special case of FG characterized by a very restricted structure of the precedence relations, thus 
providing a new characterization of VPDA in terms of operator grammars. Further restrictions are shown 
for the case of balanced languages. Then we compare FG with the height-deterministic family showing 
strict inclusion, and that reversal closure is lost by that generalization. 

The paper is structured as follows: Section [2] provides the essential definitions of the main classes of lan- 
guages (defined through automata and/or grammars) that will be considered in this paper (others will be re- 
ferred only on the basis of previous literature); Section 3 investigates the mutual inclusion relations among 
them. Section 4 compares the same classes of languages w.r.t their closure properties. The conclusion 
mentions that the non-counting invariance property of FG has a direct implication for VPDA too and shows 
that the whole picture of such language families deserves further analysis to answer a few remaining open 
issues. 

2. Basic definitions 

We list the essential definitions of parenthesis and balanced grammars, VPDA, height-deterministic 
automata, and Floyd grammars. For brevity, other classes are not defined here because they can be somewhat 



1 We propose to name them Floyd grammars to honor the memory of Robert Floyd and also to avoid confusion with other 
similarly named but quite different types of precedence grammars. 
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put in relation with the above "basic" ones. They are nevertheless taken into consideration in Section 4. The 
same name is given to a class of devices (grammars or automata) and to the class of languages that can be 
defined by means of them. 

The empty string is e, the terminal alphabet is E. For a string x and a letter a, \x\ a denotes the number of 

occurrences of letter a, and extend the notation to |x|a, for a set A C E. Let first(x) and last{x) denote 

the first and last letter of x ^ e. The projection of a string x G E* on A is denoted tt^(x). 

The operators union, concatenation, and Kleene star are called regular. A regular expression is a formula 

written using the regular operators, parentheses and letters from a specified alphabet. 

A Context-Free CF grammar is a 4-tuple G = (Vn, E, P, S), where Vn is the nonterminal alphabet, P the 

rule set, and S the axiom. An empty rule has e as the right part. A renaming rule has one nonterminal as 

right part. A grammar is invertible if no two rules have identical right parts. 

A rule has the operator form if its right part has no adjacent nonterminals, and an operator grammar (OG) 
contains just such rules. Any CF grammar admits an equivalent OG, which can be also assumed to be 



invertible 11411 . 

For a CF grammar G over E, the associated parenthesis grammar 111711 G has the rules obtained by enclosing 

each right part of a rule of G within the parentheses '[' and ']' that are assumed not to be in E. 

A balanced grammar J2D is a CF grammar has a terminal alphabet partitioned into E = E par U Ej, where 

Epar = {a, a, 6, 6, . . .} is a set of matching parentheses and the elements of E,, are named internal. Let Vn 

be the nonterminal alphabet. Every rule of a balanced grammar has the form X — > acta or X — > a, where 

a is a regular expression over Vn U Ej. The corresponding family is denoted BALAN. 

A pushdown automaton PDA A over an alphabet E is a tuple A = (Q, E, T, S, qo,F), where the initial state 

qo G Q and F C Q are the final states. V is the stack alphabet containing _L, the stack bottom symbol. The 

transition relation is 

5CQxrx(EUe)xQx(r\ 



The notation [18D pX —> qa is equivalent to (p, X, a, q, a) G S. 
A PDA is called realtime (RPDA) if pX — > qa implies a ^ e. 

A PDA is called deterministic (DPDA) if for every p G Q, X G T and a G S U {e} we have \{qa | pX 
qa}\ < 1 and if pX — > qa and pX — > q'a' then a = e 
A realtime deterministic PDA is named a RDPDA. 

The set QT* is the set of configurations of a PDA, with initial configuration go-L- 
The labelled transition system generated by A is the edge-labeled directed graph 

QF*± , [J -i 

aGS U {e} 

Given a string w G E*, we write pa q(3 if there exists a finite ^'-labelled path, w' G (S U {e})*, from 
pa to q(3, and w is the projection of w' onto E. Notice that according to [18] the ^'-labelled path includes 
transitions of the type — 
An A is complete if Vw G S*, qo± ==> go. 

The language recognized by .A is L(.A) = {w £ S* go-L pa, p G F} 
A PDA A is normalized §\m if 

1. A is complete; 

2. for all p G Q, all rules in 5 of the form pX A qa either satisfy a G E, or all of them satisfy a = e, 
but not both; 

3. every rule in 6 is of the form 



• pX A q 

• pX ^qX 

• pX A qYX where a G £ U {e} 

For a normalized PDA moves are named if \a\ = 2, /?o/? if \a\ = 0, and internal if \a\ = 1. The 
normalization preserves the characteristics of DPDA, RPDA and RDPDA devices. 

Height-determinism 

Let w E (£ U {e})*- The set N(A, w) of stack heights reached by A after reading w is {\a\ \ qo-i- ==$■ 
qalS\. A height-deterministic PDA (HPDA) is a PDA that is normalized and such that \N(A, w)\ < 1 for 
every w G (£ U {e})*. 

The families of height-deterministic PDAs, DPDAs, and RDPDAs (and languages) are resp. denoted by 
HPDA, HDPDA, and HRDPDA. 

A normalized DPDA is an HDPDA and the language families HPDA and CF coincide 11811 . 

Two HPDAs Ai and A2 over the same alphabet £ are in the equivalence relation H-synchronized, denoted 

by A! ~ H A 2 , if N(Ai,w) = N(A 2 ,w) for every m£(EU {e})*. 

Let [.A]~ H denote the equivalence class containing the HPDA A and Ahpda denote the class of languages 
recognized by any HPDA H-synchronized with A. 

Visibly pushdown automata 

A visibly pushdown (VP) QJJ] alphabet is a 3-tuple £ = (£ c , £ r , £j), with £ the disjoint union of the 
three sets. Elements of the three sets are resp. termed calls, returns and internal letters. A VP automaton 
VPDA is a PDA A = (E,Q,q ,T,S,F), where £ is a VP alphabet. The transition relation is 

S C (Q x £ c x Q x (r\ {J.}) U (Qx£ r xTxQ) U (Q x £; x Q) 

that can be readily seen to specialize the previous definition for a general PDA. 

Floyd or operator precedence grammars 

The definitions for operator precedence grammars, here renamed Floyd Grammars (FG), are from |@]. 
(See also [13] for a recent presentation.) 

For a nonterminal A of an OG G, the left and right terminal sets are 

C G (A) = {a G £ I A 4» -Baa} 7e G (A) = {aeS|i^ aa5} 

where G V/v U {e} and =4> denotes, as usual, a derivation. The two definitions are extended to a set W of 
nonterminals and to a string /? G F + via 



£o(W0 = |J C G {A) and£ G (/3) = £ G /(L») 



where L> is a new nonterminal and G' is the same as G except for the addition of the rule D — > 0. Finally 
£c(e) = 0- The definitions for 7Z are similar. 
For an OG G, let a, (3 G (V/v U £)* and a, b G £, three binary operator precedence (OP) relations are 
defined: 

equal precedence: a = b iff — > aaBbf3, B G V/v U {e}; 

yields precedence: a > 6 iff 3^4 — > aDbfl, D G V/v and a G TZg(D) 
takes precedence: a < 6 iff 3 A — > aaDj3, D G V/v and 6 G Cg{D); 



For an OG G, the operator precedence matrix (OPM) M = OPM(G) is a |E| x |S| array that to each 
ordered pair (a, b) associates the set M ab of OP relations holding between a and b. Given two OPM's M\ 
and M 2 , we define 

Mi C M 2 M hab C M 2 , afe , M = M l UM 2 <^> M ab = M hab U M 2 , afe ; Va, b. 

G is a Floyd grammar FG if, and only if, OPM(G) is a conflict-free matrix, i.e., Va, 6, | OPM(G) ab | < 1. 

Two matrices are compatible if their union is conflict-free. 

A FG is in Fischer normal form / lid/ if it is invertible, the axiom 5 does not occur in the right part of any 
rule, and there are no renaming rules, except those with left part S (if any). 
For the reader convenience the acronyms are collected in the table: 



BALAN 


balanced grammar 


CF 


context-free 


DPDA 


deterministic pushdown automaton 


FG 


Floyd grammar 


HDPDA 


height-deterministic deterministic pushdown automaton 


HPDA 


height-deterministic pushdown automaton 


HRDPDA 


height-deterministic realtime deterministic pushdown automaton 


OG 


operator grammar 


OPM 


operator precedence matrix 


REG 


regular language 


RDPDA 


realtime deterministic pushdown automaton 


RPDA 


realtime pushdown automaton 


PDA 


pushdown automaton 


VPDA 


visibly pushdown automaton 



3. Containment relations 



First we recall some of the relevant known |Q1 [HI 16, 2] containment relations between some recent 



language families, then we position FG within the picture. The main strict inclusions are: 

REG C BALAN C VPDA C HRDPDA = RDPDA C HDPDA = DPDA 

Notice that the above inclusions preserve the structural properties of the languages: for instance if the 
partition of a VP alphabet places a letter in S c and therefore associates a push move to it, the corresponding 
HDPDA automaton too performs a push move on that letter. 

The first ^ and second lU family of Caucal, as well as the one of Fisman and Pnueli [ H] fall in between 
VPDA and DPDA. but lack of space prevents a detailed presentation. 

Next we focus on FG languages. It is well-known that FG C DPDA. On the other hand, FG includes 
non-realtime deterministic languages such as L\ = {a m b n c n d m \ m, n > 1} U {a m b + ed m \ m > 1}. 
Observing that L 2 = {a n ca n \ n > 0} is in HRDPDA but not in FG, since, by an elementary application 
of the pumping lemma, this would imply a precedence conflict, we have: 

Proposition 3.1. The families ofFG and HRDPDA languages are incomparable. 

Our main result is that the VPDA languages are a well-characterized special case of FG languages. First we 
give a construction from a VPDA to a FG having a certain type of precedence matrix, second we construct a 
VPDA for any FG with such matrices. At last we include also BALAN in the matrix-based characterization. 
We need to analyze the structure of VPDA strings. A string in {c, r}* is well parenthesized if it reduces to 
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s via the cancellation rule cr —>■ e. 

Let p be the alphabetical mapping from E c U E, r U Ej to {c, r} defined by p(cj) = c, Vcj G E c , p(r 5 ) = 
r, Vrj G E r , and = £,Vsj G Ej. A non-empty string i £ E* is we// balanced if is well 

parenthesized; it is we// closed if in addition first(x) G E c and last(x) G E r . 
Let ^ = (Q, E, Q, g , T, <5, Qf) be a VPDA, with E = E c U E r U Ej. 

Lemma 3.2. Any string x G -^(-4) can be factorized as 
x = yco 2 or x = y, with cq G S c , swc/z f/jaf 

1. y = U1W1U2W2 ■ ■ ■ u^Wk, k > 0, where Uj G (Ej U E r )*, arcc? G S* z's a, possibly missing, 
well-closed string; 

2. z = V1C1V2C2 ■ ■ ■ c r -iv r , r > 0, where Cj G S c and Vj G S* « a, possibly null, well-balanced 
string. 

Proof Let the transitions from state q to q' be labelled as follows: (r, _L) denotes a move of type (q, r, _L, q') G 
5 r ; (r, Z) denotes a move of type (q, r, Z, q' ) G 5 r with Z ^ 1; | denotes a move of type (q, c, q' , Z) G 5 C ; 
s denotes a move of type (q,s,q') G 5 S . 

We examine the possible sequences of moves of a suitable VPDA A that for convenience is non-deterministic 
(determinization is always possible [1 ]). We only discuss the case x = ycQZ, since the case x = y is simpler. 
The computation starts with a series of moves in {(r, _L) | s}*, which scan the prefix u\ and leave the stack 
empty. 

Then the machine may do a series of moves to scan string w\. The first move is of type The move is 
possibly followed by a nested computation scanning a well-balanced string, and at last by a move of type 
(r, Zi). The effect is to scan a well balanced string w\. Clearly the nested computation may also include 
internal moves. 

After scanning w\ the stack is empty, and the computation may scan U2, and so on, until is scanned. 
Alternatively and non-deterministically, when the stack is empty, the machine may perform a move J^-, thus 
entering the phase that scans string z. We denote as Zjj a symbol written on the stack, which will never be 
touched by a subsequent pop move. In other words, cq is nondeterministically assumed to be an unmatched 
call. 

Then the z phase non-deterministically scans a well balanced string v\. Then, again nondeterministically, 
it may perform a move J~. Then it may scan another well balanced string V2, and so on, ending with a 
stack in _LZu + . 

At any time, when the machine enters a final state, it may halt and recognize the scanned input. 

Clearly string y is the longest prefix such that the accepting computation ends with empty stack. For 
simplicity, without loss of generality, we assume that no transition enters the initial state q$. For convenience 
we shall denote by a subscripted letter q the states traversed while scanning y, and by a subscripted letter p 
the states traversed in the computation of cqz. The state set is thus partitioned into Q = {qo} U Q q U Q p . 



Since VPL's are CF languages, previous papers (e.g. 112 111 ) have also used grammars to define them, 
but such grammars are not OG or have precedence conflicts; instead, we present a construction producing a 
grammar with the required properties. 

Theorem 3.3. For any visibly pushdown automaton A a Floyd grammar G such that L(G) = L(A) can 
be effectively constructed. 



Proof First we construct the grammar, then we prove that it is an FG, and lastly that it is equivalent to A. 
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Grammar construction. 

The rules are keyed to the factorization of Lemma [3^21 and are listed in Tables [Q El El and|U The scheme of 
a sample syntax tree produced by the grammar, for a string factorized as in Lemma IX2l is shown in Fig. Q] 




Figure 1 : Schema of a syntax tree generated by the precedence grammar constructed in Theor. 13.31 
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Figure 2: Total VP precedence matrix Mr. 

Nonterminals of class Y generate a string such that the automaton, parsing it, starts and ends with 
empty stack. Nonterminals of classes B\,B2 derive a well-balanced (but not necessarily well-closed) string. 
Nonterminals of class Z derive a string such that, starting with a non-empty stack of the form -LZy, the 
stack never pops a Zjj and at last contains a string in -LZy. 

The nonterminal symbols of the grammar are denoted by a pair of states (qi, qj) or (pi,Pj), or by a triple 
(qi, Z, qj) or (pi, Z,pj), with Z € T. Intuitively, a nonterminal of the generic form (r, . . . rj) generates 
a terminal string u if, and only if, there is a computation of the machine from the left state r$ to the right 
state rj which reads the same string and never modifies the initial stack. Furthermore, nonterminals qj) 
leave the stack unchanged; nonterminals (pi,Pj) at most increase the number of Zjj's; and nonterminals 
(qi, Z, qj) or (pi, Z,pj) denote that the computation starts and ends with Z on the top and generates a well- 
balanced terminal string w. 

To construct the rules we examine the transitions of the VPDA. In what follows, calls, returns and internal 
letters are respectively denoted c, r and s; Z,W are stack symbols different from _L. Notice that the 

Table 1 : Productions of the axiom. 



case 


transitions 


rules 


S -> 


Yc Z 


%i,co) 3 (Pj,Zu) 


5 -> 


(qo,qi)c (pj,Pf),Vpf e F 


S -> 


y 




5 -» 


(qo,qf),Vqf g f 


s -> 


Yc 


<%,Co) 3 {Pf,Zu),Pf G i 7 


5 -» 


(^o,%)c 


S -> 


c Z 




5 -» 


co(Pj,Pf), Vp/ G F 


S -> 


CO 


%0, c ) 9 (pf,Zu),pf £ F 


S -» 


CO 
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Table 2: Productions of nonterminals of class Y (deriving the maximal prefix ending with empty stack). 



case 


transitions 


rules 


v 

1 — r 


s 


fi(<10, s) 3 qi 




fao>Qi) - 


-> s 


/ — > 


r 


%o,r,±) 9 qi 




{qo,qi) - 


-> r 


Y — > 


ys 


S(q i} s) 3 qj 




fao,qj) - 


■* (qo,qi)s 


Y -> 


y r 


%i,r,_L) 9 g,- 




(go,gj) - 


■* (qo,Qi)r 


Y -> 


cBr 


%o,c) 9 (g*,Z) and% fc 


r, Z) 3 q h 


(go,g/i> - 


■* c(q t ,Z, q k )r 


y -» 


cr 


%o,c) 9 (gt,Z) and<% 4 , 


r, Z) 3 q h 


fao,Qh) - 


-> cr 


y -> 


YcBr 


c) 9 (g,, and 5(q m ,r, Z) 3 q n 


(qo,q n ) - 


-* (qo,qi)c(qj,Z,q m )r 


y -> 


Ycr 


S(qi,c) 3 (q j} Z) and 6(q m ,r, Z) 3 q n 


(qo,q n ) - 


-» (q ,qi)cr 






and qj = q m 









Table 3: Productions for nonterminals of classes Bi and B2, generating well-balanced string. (The case B2 just differs with respect 
to the state set, which is Q p instead of Q q .) 



case 


transitions 


rules 


B -> 


BcBr 


%i,c) 


3 (qj,Z) and £(g m ,r 


, Z) 3 q n 


(g, q n ) -> 


(q,qi)c(qj,Z,q m )r,Vq G Q q 


B -> 


Bcr 


%i,c) 


9 and<%,r, 


Z) 3 q n 


(q, q n ) -> 


fa,qi)cr, \/q G Q g 


5 -> 


cBr 


5{qi,c) 


9 (gj,Z) and 5(g m ,r 


, Z) 3 q n 


(gi,gn) 


cfaj,Z,q n }r 


5 -> 


cr 




9 (g,-,Z) and S(q j} r, 


Z) 3 q n 


(gi,gn) 


cr, Vq G Q g 


5 -> 


BcBr 


%i,c) 


3 (qj,Z) and 5(g m ,r 


, Z) 3 q n 


(qi, W,q n ) 


(q,qi)c{qj,Z,q m )r, 












y q eQ q ,W(£T 


5 -> 


cBr 


%i,c) 


3 (qj,Z) and 5(g m ,r 


, Z) 3 q n 


fa, w, q n ) 


-» cfo.Z, g m )r,Vg G Q 9 ,Ty G r 


5 -> 


Bcr 




3 (qj,Z) and 5(qj,r, 


Z) 3 q n 


(g, W, q n ) 


-» (g, g«)cr, Vg G Q g , g r 


£ -> 


Bs 


$(qh,s] 


3 q m 




(q,W,q m ) 


-» (g,g ft )s,VgGQ ? ,iy GT 


5 -> 


s 


S(qj,s) 


3 q m 




faj, Z,q m ) 


— > s, vw g r 



grammar constructed may be not reduced (i.e. some nonterminal may be unreachable from the axiom or 
it may not derive any terminal string). In that case the useless nonterminals and rules can be removed by 
well-known algorithms (e.g. in [15]). 

G is a Floyd grammar 

By construction all the rules are in operator form. To verify that the operator precedence matrix M is 
conflict-free, it suffices to compute the relevant terminal sets the matrix entries using the previous defini- 
tions. It should be enough to show one case. 

For the rule (qo,q n ) — ► fao,qi)cfaj, Z,q m )r the set TZcifao, qi)) C Sj U S r produces the relations 
s > c,r > c. The sets Ccifaj, Z, q m )) C £, U S c , IZcifaj, Z, q m )) C Ej U S r determine c < c,c < s 
and s > r,r > r; the right part of the rule gives c=r. Thus we obtain a conflict-free matrix M C Mt where 
Mt is the total matrix in Fig. |2] 

Fig. [3]reproduces the string of Fig. [T]with precedence relations between letters that are consecutive or 
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Table 4: Productions for nonterminals of class Z. 



case 


transitions 


rules 


Z — > 


cZ 




3 (Pj, Z u) 




(pi,Pf) 


c (Pj,Pf)>VPf G ^ 


Z — > 


c 




3 (p f ,Zu),p f G F 




\Pi,Pf) 


— > c 


Z — > 


BcZ 




3 {ph,Zu) 




/ \ 

\P,Pf) ~ 


-* (P,Pj)c(ph,Pf), Vpf G F,p€Q p 


z — > 


BC 


6(pj,c) 


9 (pf, z u),pj e f 




(p,Pf) ~ 


i \ 

(p,p,)c 


z — > 


BcBr 


o{Pi,cj 


9 (Pj,Z) and (5(p m ,r 


, Z) 3 p n 


(p,Pf) 
Mr, G O 


-> {p,pi)c{pj,Z,p m )r, 
,p/ t .r 


z -» 


cBr 


S(pi,c) 


3 (pj,Z) and 5 (p m ,r 


, Z) 3 p n 


(Pi,Pf) 


-> c(j>j, Z,p n )r, Mpf G F 


z -» 


Bar 


8(pi,c) 


3 (pj,Z) and 5(pj,r, 


Z) 3 p n 


(p,Pf) ~ 


-> (p,pi)cr, Mp £Q p , Vp/ G F 


z -» 


cr 


S(pi,c) 


3 (pj,Z) and 5{pj,r, 


Z) 3 p n 


(PnPf) 


-» cr, Vp G Q p ,p/ G F 


z -» 


Bs 


5(pj,s) 


3 pj, p/Gf 




(p,Pf) ~ 


-» (p,Pj)s,yp G 


z -> 


s 


8(pj,s) 


3 pf ,pj G F 




(PiPf) ~ 


-> s 



separated by a nonterminal. 
Proof that L(G) = L(A) 

It is obtained by a fairly natural induction showing the double implication between computations and 
derivations. It is structured into several "macro-steps" mirroring the factorization introduced in Lemma l3~2l 
We develop in detail only a sample of the various cases, since the others are similar. 

1. (qi, _L) A (qj, J_) <^=> {qi,qj) A x, x G (S r U Ej)*. 

2. (gi,cr) A (qj,a) (qi,qj) => x, x G X* and well-balanced. 

3. (pj, cr) A (pj, cr) <^=^ (pi,Pj) A x, z G £* and well-balanced. 

4. (pi,.LZ&) Afe,±Z^) <=► (pi, Pi >4c". 

(pi, Z, pj) A ui, where w is a 
cr 



5. V7 G T*, Z, (pj, -L7Z) i-> (p.-, I7Z) (without ever popping Z) <J=^> 

well-balanced string. 
Induction base: 

(a) 5( Pi , c) 3 (p k , Z) A 6(p k ,r, Z) 3 p r <^ 3W : (p t , W,pj) -» 

(b) <5(p;,s) 9 Pi : (p^W^j) -> s 
From the inductive hypotheses: 

(a) (ft, ±7^ A (p fc ,_L 7 Ty) <R,Ph) A x,x G S* 

(b) (pfc,l 7 lV) ^ (p t ,± 7 WZ) 

c 

(c) (pt,±7WZ) A (p r ,±_jWZ) (p t ,Z,p r ) A^ 

nil 

(d) (p r ,± 7 WZ) 1 ^ (pi,± 7 W) 
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I- H 

h< o H 

h < c < c > H 

h < c< co< c> H 

h < - c< s > c < c < - - - s > H 

h< r>c< r>s>co< c<c= r>s>H 

h < - - c = - - r > r > c < - - - - s > r > s > cq < c = - - r>c<c< - - c = r>r>s>H 
h < s > c < a > r > r > c < c = r > s > r > s > cq < c < s > r > c < c < s > c = r > r > s > H 



Figure 3: Precedence relations between letters during the parsing of the string of Fig. [T| The dummy string delimiters h, H by 
hypothesis respectively yield and take precedence over any other letter. 

we derive: 

(PijA-^W) i— > (p,-,J_7W) (pi,W,pj) =^ w,w = xcw\r (1) 

Special cases, such as x = e and many others, can be similarly treated. DN.B. Each inductive proof of the 
various assertions may exploit other assertions in the inductive steps. For instance the inductive hypothesis 
(a) above is based on assertion 3. 

A natural question is whether every FG defines a VPDA language or not. 

Theorem 3.4. The VPDA language family is strictly included in the FG family. 

Proof The language 

L = {b n c n | n > 1} U {f n d n | n > 1} U {e n {fb) n \ n > 1} 

is a FG language but not a VPDA language. L is generated by the FG grammar 
S ^ A\B\C A^bAc\bc fBd | fd C -> eCfb \ efb 

which has precedence relations M: 

b = c,f = d,e = f,f = b,b<b,f<f,e<e,c>c,d>d,b>f 

From b n c n C L it follows b must be a call and c a return. For similar reasons, f must be a call and d a 
return. From e n (fb) n Q L it follows that at least one ofb and f must be a return, a contradiction for a VP 
alphabet. 

FG with a partitioned precedence matrix 

We prove that the OPM sttucture obtained in the proof of Theor. I3.3l is a sufficient condition for an FG 
to generate a VPDA language thus obtaining a complete characterization of VPDA as a subclass of FG. 
For an alphabet E, let Mt be an OPM such that there exists a partition of S into three subsets Si, S2 and, 
S3 satisfying the conditions: 
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Va G Ei,V6 GS1UE3: M T [a,b] = < andVa G Ei.Vb G S 2 : M T [a,6] = =. 
Va G E 2 ,Vo G E : M T [o,6] = > 
Va G E 3 ,Vo G E : M T [a,b] = > 

Then My is termed a fota/ VP-matrix representing the VP alphabet E = (Ei,E2,E3) = (E c ,E r ,Ej), 
shown in Fig. [2] Any OPM M C Mj~ is termed a VP-matrix. 

Observe that, for any grammar G, such that OPM{G) is a VP-matrix, any rule A — > a has [a|s < 2. 
The possible stencils (or skeletons) of the right parts of the rules are NcN, NcNr, Nr, Ns, and those 
obtained by erasing one or more N. Notice that the stencils r N, crN are forbidden because r does not yield 
precedence to any letter. It follows that, for any FG having a VP matrix, the length of any right part is < 4. 

Theorem 3.5. Let G be an FG such that OPM{G) is a VP matrix. Then L(G) is a VPDA language. 

Proof First we argue that the grammar generates any string in L{G) with a syntax structure corresponding 
to the factorization presented in Lemma \3~2\ Then, in Lemma U^6\ we construct a VPDA equivalent to G. 
Let G satisfy the hypotheses ofTheor. \3. 51 For every string x G L(G), the syntax tree induces the factor- 
ization 

x = ijcqz or x = y,y = u\W\U2W2 ■ ■ ■ UkWk, z = v\C\V2C2 ■ ■ ■ c r -\v r 

where all terms are as in Lemma \3.2\ and its syntax tree has the structure shown in Fig. \J\ It suffices to 
consider that the precedence relations of the VP matrix completely determine the skeleton of the syntax tree 
( see Fig. [?]). 

Lemma 3.6. Let G = (E, Vn, P, S) satisfy the hypotheses ofTheor. \3.5\ Then L(G) is recognized by a 
VPDA automaton A= (H,Q,Qq,T,5,Qf), which can be effectively constructed. 

Proof We specify how to construct from the grammar rules a VPDA A, that recognizes by final state and 
for convenience is nondeterministic. We recall the rule stencils are just the ones previously listed. 
We set Q = V~n U {qo,p, of}, where qo,p, qF $ Vat- The pushdown vocabulary is 

r = ( (V N U {-}) x S c x (V N U {-}) ) U {!_, Z v } 

Intuitively, A is built in such a way that it enters a state B G V~n after finishing the scanning of a substring 
syntactically rooted in B. 

In state B, reading a symbol c G E c (the only ones that yield precedence), A enters state p and pushes on 
the stack a symbol, for which two cases occur. The symbol is Zjj, if the c is not to be matched by an r; it 
is (B, c, C), if the machine "looks for" a well-balanced string w such that C w. Simpler special cases 
also occur, such that A pushes on the stack a symbol (B , c, — } or {— , c, — ), "looking" directly for r. 
In state p, reading a c, A remains in the state and pushes on the stack either the symbol Zjj if the c is not to 
be matched, or a symbol (—, c, C) if it "looks for" a string w such that C w. 

Finally we describe the moves that read r G E r . If the stack is empty, the machine enters a state A 
associated to a nonterminal. If the top of stack is a symbol (B, c, C), the machine pops the stack and enters 
a state A. Here too some simpler special cases exist. 

The final states set is defined as Qp = {A j S =>■ @A, A ^ S} U {o_f} U {go iff S — > e G P}. Notice that 
a rule A — > cB can be used only in a derivation such as S =4> aA acB =4> x, otherwise c would take 
precedence over some other letter. Thus, A and B are both in Q p. 

The transition relation 5 is then built from P according to Table\5\ Notice that the derivations S =>■ Act 
needed in section 1 of the table can be effectively computed. 
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Table 5: Transition relation 8 of A. 





rules 


5 


1 


A -» 


s 


(an, s. A) 




A — 


r, such that S 1 =>■ 


(go, r, -L, A) 










2 


A — > 


s 


(p,s,A) 




A — > 


5s 


(B,s,A) 




A — > 


£r 


(B,r,-L,A) 


3 


A — > 


cB 


(p,c,p,Zu) 




5^ 


BcC 


(B,c,p,Zu) 


4 


5^ 


BcCr 


(B,c,p, (B,c,C)) 
(C.r, (B,c,C), ?F ) 




5^ 


s 


{qo,s,qF) 




5^ 


c 


(q ,c,Zu,q F ) 




5^ 


r 


(qo,r,J-,q F ) 




A — > 


BcCr 


{B,c,p, (B,c,C)) 
(p,r, <5,c,CM) 




A — > 


Bar 


(B,c,p, (i?,c,-» 
(p,r, (B,c,-), A) 


5 


A — > 


cBr 


(P,c,p, (-,c,B)) 
(B,r,(-,c,B),A) 




A — > 


cr 


(P,c,p,(-,c,-)) 
(B,r,{-,c,-),A) 



The proof of the equivalence L{A) = L{G) somewhat mirrors the equivalence proof of Theor. \3. 3\ For 
instance, from section 2 of Table\5\the following lemma immediately descends: 

A=>w,we (Si U S r )* <^=> Bo" G (r \ {±}) V £ Q such that (t, _Lcr) A (A, Icr) 
Similarly, the lemma 

A w,w well balanced <=^> 3a € (r \ {_L})*, t € Q smc/j that (t, Xcr) A fi, _|_cr) 

can fee proved by a natural induction, taking as the basis the cases A — > cr an<i A — ► s, anJ f/jen exploiting 
for the induction steps sections 2, 4, and 5 of Table \5\ Further details of the proof are omitted as fairly 
obvious. 

Second, we remark that various subclasses of VPDA languages recently considered correspond to restric- 
tions on the VP-precedence matrix and/or on the stencils of the grammar rules. A nice illustration is the 
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family BALAN |2|]. First, balanced grammars do not allow any a or n to be unmatched. Thus an FG 
such that no rule has the stencils NciN, Nci, CjiV, Ci,Nri ensure the balancing property. Second, balanced 
grammars do not allow a Cj to be matched by distinct returns rj, (and similarly for n). An FG such 
that |E C | = |S r | and the OPM submatrix identified by rows S Ci and columns S r4 contains = only on the 
diagonal, ensures the bijection of call and return letters. 

4. Closure properties 

All families considered here (except DPDA) share the property of being boolean algebras, for suitably 
defined subsets. The core of the property dates back to the original approach by McNaughton and the 
"structure preserving" operations as in |[9[]. Other closure properties possessed by VPDA, though relevant 
and classical, have been less investigated. It appears that all the previous families more general than VPDA 
lack some closure properties, as shown in the next table. 



family 


boolean operations 


concatenation, Kleene star 


reversal 


VPDA [1] 


yes for a fixed VP alpha- 
bet 


yes for a fixed VP alphabet 


yes 


FG 


yes for compatible prece- 
dence matrices ||9|] 


yes 


yes (proved here) 


HRDPDA 


yes for H-synchronized 
languages | 18] 


no£] 


no (proved here) 



The reversal of a FG language is generated by the specularly reversed rules; they are a FG grammar with a 
matrix obtained interchanging yield- and take-precedence relations. 

We observe that the boolean closure of FG languages has been proved in |@] by extending McNaughton's 
method for parenthesis languages. It states that the union of two FG having compatible precedence matrices 
is a FG language with compatible matrices, and similarly for the other operators. We notice that this is 
not implied by the closure property irTil of the equivalence class of H-synchronized HDPDA languages, 
although two FG's having compatible matrices are necessarily H-synchronized^]. 

On the other hand the closure of VPDA languages for a given VP alphabet, under the boolean operators and 
under reversal, are an immediate consequence of the same properties of the family of FG languages having 
compatible precedence relations, 

Since HRDPDA=RDPDA, the non-closure under reversal follows from a classical counterexample, used 
for proving the same for deterministic languages: the reversal of {la n b n \ n > 0} U {2a n b 2n \ n > 0} is 
non-deterministic . 

The proofs of concatenation and Kleene star closures for FG's is more intricate than with other traditional 
families of CF grammars due to the need to preserve the operator structure and the precedence relations. 
For kleene star the property requires the assumption that the = relation does not contain cycles. 
In conclusion, the FG family is currently the one, among the existing VPDA generalizations, that preserves 
the majority, and possibly the totality, of VPDA closure properties. 



2 For brevity we omit the natural construction of the HDPDA equivalent to a FG grammar. 
3 A complete proof is under development. 
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5. Conclusions 

We mention some open questions raised by the present study. 
FG appears at present to be the family that preserves the majority, and possibly the totality, of VPDA closure 
properties, but we wonder whether more general families can be found with the same properties. 

In a different direction, it is possible to transfer to VPDA a rather surprising invariance property of FG. 
We recall the definition of Non-Counting context-free grammar [7], which extends the notion of NC regular 
language [19]. L = L(G) is NC if for the parenthesized language L(G), the following condition holds: 
3n > : \/x,v,w,v,y £ £*, where w and vwv are well-parenthesized, and Vm > 0, xv n wv n y € L if, 
and only if, xv n+m wv n+m y € L. In general, two equivalent CF grammars may differ with respect to the 
NC property. However if an FG grammar is NC, then all equivalent FG grammars are NC [8]. Consider 
now, for a VPDA LCE*, two equivalent VPDA recognizers. Notice the two VP alphabets may differ with 
respect to the 3-partition of the letters. The two corresponding FG's (Theor. 13.31 ) may differ in precedence 
relations, but they are either both NC or both counting. We wonder whether such invariance property holds 
for other families of grammars generalizing VPDA. 

Last, it would be interesting to assess the suitability of Floyd languages for the applications that have 
motivated balanced grammars and VPDA. We observe that the greater generative capacity of FG's permits 
to define more realistic recursively nested structures. For instance, the VPDA approach uses single letters 
to represent a call c and the corresponding return r, but this is just an abstraction. In real programming 
languages a call is a string typically containing the name of the invoked procedure and possibly a list of 
parameters. Also, at it is suggested by the example in the proof of Theor l3~4l a return corresponding to a 
given call may use the same letters as some other call. This will cause conflicts in the partitioning of S, but 
can be dealt with by suitable precedence relations. Similar examples can be found in the area of mark-up 
languages. 

Finally, for application in model checking, the computational complexity of the decision problems for FG 
languages should be studied. 
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